ProRefiner: an entropy-based refining strategy for inverse protein folding with global graph attention

Inverse Protein Folding (IPF) is an important task of protein design, which aims to design sequences compatible with a given backbone structure. Despite the prosperous development of algorithms for this task, existing methods tend to rely on noisy predicted residues located in the local neighborhood when generating sequences. To address this limitation, we propose an entropy-based residue selection method to remove noise in the input residue context. Additionally, we introduce ProRefiner, a memory-efficient global graph attention model to fully utilize the denoised context. Our proposed method achieves state-of-the-art performance on multiple sequence design benchmarks in different design settings. Furthermore, we demonstrate the applicability of ProRefiner in redesigning Transposon-associated transposase B, where six out of the 20 variants we propose exhibit improved gene editing activity.

First, the scope of memory-efficient graph attention is much broader than the problem of inverse protein folding (IPF.)For example, AlphaFold2's row attention includes a bias term accounting for edge-features scales quadratically and the pair-representation track scales cubically (to account for the triangular inequality).It would be interesting in future works to see how memory-efficient graph attention could be applied to tasks beyond (IPF.)Despite the interesting ideas introduced in the manuscript, I had difficulty following the flow of the discussion and, more importantly, the exact meaning of 'our model.' For example, the 'memory-efficient global graph attention mechanism' discussion in the main text, starting on page 4, is not particularly informative.Either it targets readers without prior knowledge about vanilla attention, and I do not think they can follow much of the arguments, or those familiars with the attention mechanism, the discussion is relatively shallow.
Returning to the 'model' in the Results section, I do not understand what is reported in Table 1.On top of reporting the performance of existing models (GVP, ProteinMPNN, and ESM), I would expect one row reporting the model results, which combines the entropy-based approach and the memory-efficient attention graph attention (with and without partial sequences).I do not understand the meaning of 'Ours GVP-GNN,' and the same for ProteinMPNN and ESM.Moreover, what is the meaning of 'Ours' in Table 2? For example, is 'Ours ESM' trained using the same regimen as the original ESM, using 12M additional structures predicted from AlphaFold2?On top of recovery and nssr, it would be interesting to report the perplexity.If I understand correctly what the authors mean by noise, this could be related to the perplexity score.
I am unsure of the difference between the results in Table 1 and Fig3.A.I understand that Table .1 reports results about partial sequences and Fig 3 .A about whole sequences.
The main results reported in Table 1 should be about the entire sequence case (and not partial sequence), as this is the standard in the literature.
The ablation study assessing the relative importance of global attention and partial input is run by considering on EnzBench and BR_EnzBench.Running such an ablation on CATH would have been more interesting, as it allows direct comparison with existing models.
The global graph attention model updates the node and edge features.I have a comment regarding the edge update on Equation 11.A naïve update like in Eq.11 may violate the triangular inequalities, i.e., edge features that have to fulfill the triangular inequality are not independent.Such a condition is enforced in AlphaFold2 by considering a cubic update; roughly, an edge e_ij is updated conditioned on all k: e_ik e_kj.I am curious if the authors thought of such constraints.
Reviewer #2 (Remarks to the Author): In the manuscript, the authors proposed a refinement model for inverse protein folding, which designs sequences that are consistent with a given backbone structure and partial sequence information.The model utilizes an entropy-based masking strategy to construct partial sequence context for later sequence prediction.In addition, the model introduces pseudo edge features shared by non-existing edges in its graph model, which significantly reduces the memory usage compared to fully-connected graph neural networks.The potential significance of this manuscript comes from two key aspects: first, improvements in the inverse folding process facilitate structure-based de novo protein design; second, a memory-efficient graph neural network of equivalent model performance offers a powerful mechanism for reasoning with larger proteins.
However, there are two major concerns with the conclusion drawn from this manuscripts: 1.The authors evaluated model performance based on sequence recovery rate and native sequence similarity recovery.However, it is unclear how well these designed sequences fold into the target structures.It is possible that a model with lower sequence recovery rate generates sequence designs better matching with input target structures.Thus, further in-silico structural validations are essential for demonstrating the effectiveness of the proposed refinement method.2. Memory-efficient global attention layer utilizes a shared pseudo edge feature for non-existing edges in order to save memory.While the authors illustrate the model's memory advantage in Figure 6E, it is crucial to compare the predictive performance between the memory-efficient global attention layer and the original global attention layer (and preferably across several evaluation tasks).Given provided information in the manuscript, it is inconclusive on the effectiveness and usefulness of the proposed memory-efficient global attention layer.
In addition, there are several points that requires further clarifications: 1.Details on how base models are trained and evaluated for both entire sequence design and partial sequence design are needed to better understand the performance of the proposed refinement model.For example, ProteinMPNN, to some degree, exhibits a tradeoff between sequence recovery ratio and in-silico success rate, controlled by the training noise level.Thus, it is important to provide the backbone noise ratio for the ProteinMPNN model used.2. For full sequence inverse folding, the model relies on base models to generate partial sequence.What is the strength of the entropy-based mask?What is the percentage of residues that are provided as context for the later model?3.For partial sequence design, is ProteinMPNN retrained with partial sequence provided as input?If not, how are GVP-GNN and ProteinMPNN used for partial sequence design?Moreover, how does ESM perform on partial sequence design?4. At training time, what percentage of the residues are masked as unknown? 5. How much overlapping is there among the three evaluation datasets?
Reviewer #3 (Remarks to the Author): Zhou et al propose a methodology for improving inverse protein folding.In order to demonstrate the new method a proof of principle is performed in TnpB, a very interesting programmable nuclease with a compact size.
#general comments 1.TnpB mutant characterization would benefit of more detailed analysis: -Offtarget activity.The authors hypothesize that a more positive surface increases activity.However, one concerning possibility is a drop in specificity.Indeed, engineering non specific charge residues has been use to refine specificity of the related cas9 protein (see Slaymaker et al, Science 2016).The authors may need to compare offtarget of these nucleases to WT or at least mention this risk of off-target increase.
-On-target assessment of programmable nuclease activity was performed with nextGen sequencing followed by indel counting.More details would need to be provided.Additionally, it would be ideal to use an established methodology to perform this analysis such as CRISPRESSO, CRISPR-A, CRISPRseek, ... due to the complexities associated with gene editing efficiency assessment.
Reviewer #2 (Remarks to the Author): Thank you for addressing my concerns in my previous comment and validating the pipeline with additional experiments.Overall, this paper introduces a refinement model that further improves inverse folding performance.In addition, the memory-efficient global attention mechanism provides a possible way to reduce the memory usage, without significantly trading off on the performance of global attention.There are two minor points that would require further clarifications: 1.How is AlphaFold2 used for structural evaluation?Does it utilize only MSA of the designed sequence, or does it utilize both MSA and searched templates?2. Comparing the results in Supplementary Table 4 with the results in Supplementary Table 3 (or Supplementary Table 5), there is a significant drop in structural recovery performance in terms of both TM score and RMSD.What are the main reasons behind this drop in performance?Also, it would be useful to confirm if this is the effect of switching in datasets or the effect of switching in structure prediction models.
Reviewer #3 (Remarks to the Author): Authors revisions solved all my concerns.No more comments from my side.